Text Normalization System for Bangla
نویسنده
چکیده
This paper describes a process of text normalization system for the Bangla language (exonym: Bengali) by identifying the semiotic classes from Bangla text corpus. After identifying the semiotic classes, a set of rules was written for tokenization and verbalization. This study is important for Text-ToSpeech (TTS) system and as well as for creating a language model used in speech recognition.
منابع مشابه
Development of annotated Bangla speech corpora
This paper describes the development procedure of three different Bangla read speech corpora which can be used for phonetic research and developing speech applications. Several criteria were maintained in the corpora development process that includes considering the phonetic and prosodic features during text selection. On the other hand, a specification was maintained in the recording phase as ...
متن کاملBuilding Statistical Parametric Multi-speaker Synthesis for Bangladeshi Bangla
We present a text-to-speech (TTS) system designed for the dialect of Bengali spoken in Bangladesh. This work is part of an ongoing effort to address the needs of new under-resourced languages. We propose a process for streamlining the bootstrapping of TTS systems for under-resourced languages. First, we use crowdsourcing to collect the data from multiple ordinary speakers, each speaker recordin...
متن کاملTTS for Low Resource Languages: A Bangla Synthesizer
We present a text-to-speech (TTS) system designed for the dialect of Bengali spoken in Bangladesh. This work is part of an ongoing effort to address the needs of under-resourced languages. We propose a process for streamlining the bootstrapping of TTS systems for under-resourced languages. First, we use crowdsourcing to collect the data from multiple ordinary speakers, each speaker recording sm...
متن کاملAutomatic Extraction of Compound Verbs from Bangla Corpora
In this paper we present a rule-based technique for the automatic extraction of Bangla compound verbs from raw text corpora. In our work we have (a) proposed rules through which a system could automatically identify Bangla CVs from texts. These rules will be established on the basis of syntactic interpretation of sentences, (b) we shall explain problems of CV identification subject to the seman...
متن کاملBangla Text to Speech using Festival
This paper describes the development of the first, usable, open source and freely available Bangla Text to Speech (TTS) system for Bangladeshi Bangla using the open source Festival TTS engine. Besides that, this paper also discusses a few practical applications that use this system. This system is developed using diphone concatenation approach in its waveform generation phase. Construction of a...
متن کامل